Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 4 issues found in the latest run.
- ✅ Fixed: Cluster detection always returns false in execution service
- Updated execution cluster detection to read
cluster_enabledfrom the nestedinfo.clustersection returned bygetInfo(['cluster']).
- Updated execution cluster detection to read
- ✅ Fixed: Accidentally committed local Claude settings and context files
- Removed both local artifact files from version control and added
.betterdb_context.mdto.gitignoreso neither file is tracked again.
- Removed both local artifact files from version control and added
- ✅ Fixed: Baseline window uses wrong snapshot for oldest timestamp
- Made baseline window computation order-agnostic by using the minimum snapshot timestamp instead of assuming array ordering.
- ✅ Fixed: Eviction misses cancelled jobs in validation service
- Extended validation job eviction filtering to include cancelled terminal states in addition to completed and failed.
Or push these changes by commenting:
@cursor push 2dff9c0d12
Preview (2dff9c0d12)
diff --git a/.betterdb_context.md b/.betterdb_context.md
deleted file mode 100644
--- a/.betterdb_context.md
+++ /dev/null
@@ -1,298 +1,0 @@
-# BetterDB Session Context
-_Retrieved 3 memories. Auto-generated — do not edit._
-
-## Session Memories
-- **[2026-03-19]** Stopped BetterDB monitor, drafted commit message for new autostart lifecycle code, and reviewed branch against master identifying 10 code quality findings (2 high, 5 medium, 3 low severity).
- - Decision: Stopped persistent BetterDB monitor process (PID 3166397)
- - Decision: Used git diff to identify staged and unstaged changes before drafting commit message
- - Decision: Focused commit message on the new autostart.ts file as the meaningful change
- - Decision: Ran code review against master branch for all commits on current branch
- - Decision: Triaged 10 review findings across high, medium, and low severity categories
- - Solved: BetterDB monitor running in background needed to be stopped → Identified and stopped process PID 3166397
- - Solved: Need concise commit message describing staged changes → Drafted message focusing on new autostart/stop lifecycle and connection management tools
- - Solved: Code quality issues in new MCP autostart implementation → Ran full branch review identifying 10 findings: 2 high severity (TOCTOU race, mutable env), 5 medium, 3 low
- - Open: Fix TOCTOU race condition on PID read vs process.kill in autostart.ts:51-58
- - Open: Resolve unlinkSync ENOENT error on concurrent stopMonitor calls (autostart.ts:155)
- - Open: Address empty string env var handling using || operator in runner.ts:31-42
- - Open: Add test coverage for new autostart and connection management code
- - Open: Consider whether to address setAsDefault in POST body and prefix detection timing findings
-- **[2026-03-19]** Added MCP monitor lifecycle management with start/stop tools, environment variable config overrides, and fixes for process orphaning, stale URLs, signal leaks, and resource cleanup.
- - Decision: Added environment variable overrides for database and storage config (DB_HOST, DB_PORT, STORAGE_TYPE, PORT) to take precedence over saved config
- - Decision: Implemented MCP tools for start_monitor and stop_monitor with persist flag and dynamic port/storage selection
- - Decision: Created unified apiRequest function to replace separate apiFetch and rawFetch implementations
- - Decision: Added PID file guarding with existsSync checks before writes and reads
- - Decision: Replaced sleep-based polling with port-release detection polling in monitor startup
- - Decision: Implemented ephemeral signal handler registration that cleans up after health check completion
- - Decision: Used process.kill(pid, 0) pattern for process existence verification
- - Decision: Bundled module-level BETTERDB_URL and detectedPrefix as mutable connection state
- - Decision: Added alreadyRunning flag to distinguish between fresh start and reused monitor instances
- - Decision: Implemented cmdline validation for PID file verification on process health checks
- - Solved: Orphan monitor processes left running on health check timeout → Moved signal handler registration inside health-check try block and removed it in finally clause to prevent ephemeral handler leak
- - Solved: Stale URL used when monitor was already running from previous session → Added alreadyRunning detection and always update BETTERDB_URL and process.env.BETTERDB_URL in both fresh start and reuse scenarios
- - Solved: Blocking sleep(1000) in monitor startup preventing rapid iteration → Replaced sleep with polling loop checking for port availability release on success
- - Solved: Unconsumed stdout pipe from child process causing resource leak → Added explicit stdout.pipe(process.stdout) for monitor subprocess
- - Solved: Inconsistent API request handling with duplicate fetch logic → Created unified apiRequest function replacing separate apiFetch and rawFetch methods
- - Open: TOCTOU vulnerability: PID can be recycled between existsSync check and process.kill(pid, 0) — needs cmdline validation
- - Open: Race condition on concurrent start_monitor/stop_monitor calls mutating module-level BETTERDB_URL and detectedPrefix state
- - Open: stopMonitor may throw if PID file already deleted by concurrent call — needs atomic unlink with error suppression
- - Open: process.kill(pid, 0) succeeds on Windows regardless of actual process state — cross-platform validation needed
- - Open: No timeout enforcement on initial port availability polling — could hang indefinitely if port never releases
-- **[2026-03-19]** Added MCP monitor autostart/stop lifecycle management with environment variable override support and fixed PID write guards and polling logic.
- - Decision: Add environment variable override support in mapConfigToEnv with || operator fallback pattern
- - Decision: Implement MCP tool handlers for start_monitor and stop_monitor lifecycle management
- - Decision: Use process.once() for signal handling in autostart lifecycle
- - Decision: Guard PID file writes with existence checks before writing
- - Decision: Replace sleep-based port release polling with active port-check polling
- - Decision: Module-level BETTERDB_URL and detectedPrefix variables for connection state
- - Decision: Deduplicate apiFetch calls by moving to shared utility
- - Decision: Autostart mode triggered by CLI flag with persist option for background monitoring
- - Solved: PID file writes not guarded against errors or existing files → Added existence checks before writing PID file (commit b833ae3)
- - Solved: Sleep-based polling for port release was inefficient and brittle → Replaced with active port-release polling mechanism (commit b833ae3)
- - Solved: apiFetch logic duplicated across multiple code paths → Deduplicated into single utility function (commit b833ae3)
- - Solved: Config values overriding environment variables in runner setup → Reversed precedence to allow environment variable override of saved config
- - Solved: No way to manage monitor lifecycle from MCP tools → Added start_monitor and stop_monitor MCP tool handlers with persistence support
- - Open: Signal handler accumulation risk in ephemeral mode if startMonitor called multiple times (handlers not deduplicated by closure)
- - Open: Race condition possible on mutable module-level BETTERDB_URL when tool calls execute concurrently
- - Open: stopMonitor does not await process exit before returning, port may still be in use
- - Open: Missing test coverage for env variable override precedence in mapConfigToEnv
- - Open: No documented guarantee about concurrent tool call safety for start_monitor with other tools
-
-## Files with History
-- packages/mcp/src/autostart.ts
-- packages/mcp/src/index.ts
-- packages/cli/src/runner.ts
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/storage-port.interface.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/shared/src/index.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/database/adapters/unified.adapter.ts
-- Implemented Valkey Search index stats backend (capability detection, on-demand FT.INFO parsing, controller endpoints) and frontend page, but FT.INFO parser needs debugging to correctly extract vector metadata from the flat array response. (2026-03-11)
-- Fixed missing capability guard in vectorSearch and prevented binary arg replacement from corrupting command names. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/shared/src/index.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/app.module.ts
-- Scaffolded @betterdb/mcp as a thin wrapper MCP server that authenticates via agent tokens and proxies Valkey observability queries through a dedicated guarded API controller. (2026-03-13)
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Scaffolded packages/mcp as a production-ready MCP server, secured it with agent token authentication, fixed three security/correctness bugs in workflows and services, and cleaned up unused dependencies. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/mcp/mcp.module.ts
-- Scaffolded @betterdb/mcp as a thin wrapper MCP server that authenticates via agent tokens and proxies Valkey observability queries through a dedicated guarded API controller. (2026-03-13)
-- Scaffolded a complete MCP server for BetterDB with JWT token auth, 6 observability endpoints, and integrated it into Claude Code for end-user testing. (2026-03-12)
-- Scaffolded packages/mcp as a production-ready MCP server, secured it with agent token authentication, fixed three security/correctness bugs in workflows and services, and cleaned up unused dependencies. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/mcp/mcp.controller.ts
-- Scaffolded @betterdb/mcp as a thin wrapper MCP server that authenticates via agent tokens and proxies Valkey observability queries through a dedicated guarded API controller. (2026-03-13)
-- Scaffolded a complete MCP server for BetterDB with JWT token auth, 6 observability endpoints, and integrated it into Claude Code for end-user testing. (2026-03-12)
-- Scaffolded packages/mcp as a production-ready MCP server, secured it with agent token authentication, fixed three security/correctness bugs in workflows and services, and cleaned up unused dependencies. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/app.module.ts
-- Scaffolded @betterdb/mcp as a thin wrapper MCP server that authenticates via agent tokens and proxies Valkey observability queries through a dedicated guarded API controller. (2026-03-13)
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Scaffolded packages/mcp as a production-ready MCP server, secured it with agent token authentication, fixed three security/correctness bugs in workflows and services, and cleaned up unused dependencies. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/mcp/src/index.ts
-- Scaffolded @betterdb/mcp as a thin wrapper MCP server that authenticates via agent tokens and proxies Valkey observability queries through a dedicated guarded API controller. (2026-03-13)
-- Scaffolded a complete MCP server for BetterDB with JWT token auth, 6 observability endpoints, and integrated it into Claude Code for end-user testing. (2026-03-12)
-- Scaffolded packages/mcp as a production-ready MCP server, secured it with agent token authentication, fixed three security/correctness bugs in workflows and services, and cleaned up unused dependencies. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/mcp/src/autostart.ts
-- Fixed environment variable precedence in CLI config resolution so MCP can override the monitor port via process.env, and updated MCP autostart to use local CLI bin for testing before npm release. (2026-03-19)
-- Fixed three high/medium severity issues in MCP autostart and monitor: ephemeral signal handler leak, async process exit waiting, and stdout pipe blocking. (2026-03-19)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/agent/README.md
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/agent/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/agent/src/index.ts
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/agent/src/command-executor.ts
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-- Fixed missing capability guard in vectorSearch and prevented binary arg replacement from corrupting command names. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/agent/src/command-executor.ts
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-- Fixed missing capability guard in vectorSearch and prevented binary arg replacement from corrupting command names. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/shared/src/types/health.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-- Implemented Valkey Search index stats backend (capability detection, on-demand FT.INFO parsing, controller endpoints) and frontend page, but FT.INFO parser needs debugging to correctly extract vector metadata from the flat array response. (2026-03-11)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/shared/src/index.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/storage-port.interface.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/types/metrics.types.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/types/metrics.types.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/types/metrics.types.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/types/metrics.types.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/types/metrics.types.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/shared/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/storage-port.interface.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/storage-port.interface.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/types/metrics.types.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented Vector Search index stats support with on-demand FT.INFO/FT.SEARCH querying and fixed binary arg corruption in agent executor. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/storage/adapters/sqlite.adapter.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/storage/adapters/sqlite.adapter.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/storage/adapters/postgres.adapter.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/storage/adapters/postgres.adapter.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/storage/adapters/memory.adapter.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/database/adapters/unified.adapter.ts
-- Implemented Valkey Search index stats backend (capability detection, on-demand FT.INFO parsing, controller endpoints) and frontend page, but FT.INFO parser needs debugging to correctly extract vector metadata from the flat array response. (2026-03-11)
-- Fixed missing capability guard in vectorSearch and prevented binary arg replacement from corrupting command names. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/database/adapters/unified.adapter.ts
-- Implemented Valkey Search index stats backend (capability detection, on-demand FT.INFO parsing, controller endpoints) and frontend page, but FT.INFO parser needs debugging to correctly extract vector metadata from the flat array response. (2026-03-11)
-- Fixed missing capability guard in vectorSearch and prevented binary arg replacement from corrupting command names. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/storage/adapters/memory.adapter.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/storage/adapters/memory.adapter.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/storage-port.interface.ts
-- Added vector index snapshot tracking with 30-second polling, storage adapters, API endpoint, and frontend Sparkline visualization component. (2026-03-14)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/web/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/web/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/shared/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/packages/semantic-cache/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/web/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/web/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/api/src/common/interfaces/database-port.interface.ts
-- Implemented full-stack vector search with FT.SEARCH integration, agent binary protocol support, input validation, and React UI for KNN similarity queries. (2026-03-12)
-- Implemented comprehensive vector search index stats and live KNN search tester with multi-version support for Valkey Search and RediSearch, including HNSW parameters and field metadata extraction. (2026-03-11)
-- Implemented Valkey vector search index stats API endpoints with capability detection, resolved agent connection ID instability, and identified hot key data loss was due to in-memory-only tracking before persistent storage was added. (2026-03-13)
-
-## File History: /home/kristiyan/projects/valkey/monitor/apps/web/package.json
-- Converted BetterDB Memory from Docker-based project to standalone npm package with compile-on-install flow, validated packaging pipeline, and prepared GitHub Actions for automated npm publishing. (2026-03-02)
-- Implemented SSH tunnel support for remote Valkey connections with password/key auth, TLS SNI passthrough, and fixed SSH client/secret lifecycle management issues. (2026-03-04)
-- Implemented Valkey Search (FT) index browser as a collapsible tree in VS Code extension with schema awareness, field type badges, and FT.INFO parsing for HASH/JSON indexed keys. (2026-03-18)
\ No newline at end of file
diff --git a/.claude/settings.local.json b/.claude/settings.local.json
deleted file mode 100644
--- a/.claude/settings.local.json
+++ /dev/null
@@ -1,187 +1,0 @@
-{
- "permissions": {
- "allow": [
- "Bash(git fetch:*)",
- "Bash(fuser:*)",
- "Bash(NODE_ENV=development BETTERDB_LICENSE_KEY=asdasd STORAGE_TYPE=postgres STORAGE_URL=\"postgresql://betterdb:devpassword@localhost:5432/betterdb\" DB_HOST=localhost DB_PORT=6380 DB_PASSWORD=devpassword DB_USERNAME=default pnpm dev:*)",
- "Bash(curl:*)",
- "Bash(__NEW_LINE_69c8496eb0256e29__ echo \"\")",
- "Bash(# Get full response to see timestamps curl -s \"\"http://localhost:3001/metrics/slowlog?count=30&excludeMonitor=true\"\")",
- "Bash(__NEW_LINE_6cdd3e3918f56f8e__ echo \"\")",
- "Bash(pnpm add:*)",
- "Bash(NODE_ENV=development BETTERDB_LICENSE_KEY=asdasd STORAGE_TYPE=postgres STORAGE_URL=\"postgresql://postgres:postgres@localhost:5433/betterdb\" pnpm dev:*)",
- "Bash(NODE_ENV=development BETTERDB_LICENSE_KEY=asdasd DATABASE_HOST=localhost DATABASE_PORT=6380 DATABASE_PASSWORD=devpassword STORAGE_TYPE=postgres STORAGE_URL=\"postgresql://betterdb:devpassword@localhost:5432/betterdb\" pnpm dev:*)",
- "Bash(pnpm:*)",
- "Bash(valkey-cli:*)",
- "Bash(pkill:*)",
- "Bash(pgrep:*)",
- "Bash(npx tsc:*)",
- "Bash(npx ts-node:*)",
- "Bash(# Kill all related processes and start fresh pkill -9 -f \"\"ts-node\"\" || true pkill -9 -f \"\"vite\"\" || true sleep 2 cd /home/kristiyan/projects/valkey/monitor && NODE_ENV=development DB_HOST=localhost DB_PORT=6380 DB_PASSWORD=devpassword STORAGE_TYPE=postgres STORAGE_URL=\"\"postgresql://betterdb:devpassword@localhost:5432/betterdb\"\" pnpm dev 2>&1 & sleep 8 grep -E \"\"\\(LOG|ERROR|starting|running\\)\"\" /tmp/server2.log)",
- "Bash(PGPASSWORD=devpassword psql:*)",
- "Bash(python3:*)",
- "Bash(node /tmp/insert-test-data.js:*)",
- "Bash(# Test with exact range around yesterday''s timestamp \\(1769538717\\) echo \"\"Querying for timestamp around 1769538717 \\(yesterday\\)\"\" curl -s \"\"http://localhost:3001/commandlog-analytics/entries?type=slow&startTime=1769538000&endTime=1769539000&limit=50\"\" cat /tmp/yesterday.json)",
- "Bash(ls:*)",
- "Bash(docker run:*)",
- "Bash(docker logs:*)",
- "Bash(docker rm:*)",
- "Bash(docker ps:*)",
- "Bash(docker stop:*)",
- "Bash(docker network ls:*)",
- "Bash(docker exec betterdb-monitor-valkey valkey-cli:*)",
- "Bash(# Find the SQLite database in the container docker exec betterdb-monitor-app ls -la /app/data/ || docker exec betterdb-monitor-app find /app -name \"\"*.db\"\")",
- "Bash(docker exec:*)",
- "Bash(docker pull:*)",
- "Bash(git -C /home/kristiyan/projects/valkey/monitor ls-files:*)",
- "Bash(git -C /home/kristiyan/projects/valkey/monitor/proprietary ls-files:*)",
- "Bash(git -C /home/kristiyan/projects/valkey/monitor check-ignore proprietary/)",
- "Bash(git add:*)",
- "Bash(git check-ignore:*)",
- "Bash(docker compose:*)",
- "Bash(docker start:*)",
- "Bash(docker restart:*)",
- "Bash(ss:*)",
- "Bash(netstat:*)",
- "Bash(# Check current memory usage docker exec betterdb-monitor-valkey valkey-cli -a devpassword --no-auth-warning INFO memory)",
- "Bash(# Check what network the existing containers are on docker network ls docker inspect betterdb-monitor-valkey --format ''{{range .NetworkSettings.Networks}}{{.NetworkID}}{{end}}'')",
- "Bash(docker inspect:*)",
- "Bash(tree:*)",
- "Bash(npx jest:*)",
- "Bash(NODE_OPTIONS=\"--experimental-vm-modules\" node:*)",
- "Bash(echo:*)",
- "Bash(SKIP_DOCKER_SETUP=1 npx jest:*)",
- "Bash(SKIP_DOCKER_SETUP=true npx jest:*)",
- "Bash(for i in {1..20})",
- "Bash(do echo \"Pull #$i\")",
- "Bash(done)",
- "Bash(git pull:*)",
- "Bash(sudo lsof:*)",
- "Bash(npm run build:*)",
- "Bash(npm test:*)",
- "Bash(python:*)",
- "Bash(source venv/bin/activate)",
- "Bash(VALKEY_PASSWORD=devpassword python3:*)",
- "Bash(grep:*)",
- "Bash(# Check the password from docker-compose or environment grep -r \"\"6380\\\\|password\\\\|VALKEY\"\" /home/kristiyan/projects/valkey/monitor/docker-compose*.yml)",
- "Bash(redis-cli:*)",
- "Bash(# Check current connected clients echo \"\"Current connected clients:\"\" redis-cli -p 6380 -a devpassword CLIENT LIST)",
- "Bash(# Check rejected_connections before echo \"\"Before - rejected connections:\"\" redis-cli -p 6380 -a devpassword INFO clients)",
- "Bash(# Kill any lingering background redis-cli processes pkill -f \"\"BLPOP.*flood_queue\"\" pkill -f \"\"BLPOP.*waiting_queue\"\" # Get baseline rejected connections echo \"\"Baseline rejected_connections:\"\" redis-cli -p 6380 -a devpassword INFO clients)",
- "Bash(# Check docker-compose for 6381 config grep -A 20 \"\"6381\"\" /home/kristiyan/projects/valkey/monitor/docker-compose.yml)",
- "Bash(node -e:*)",
- "Bash(do valkey-cli -p 6380 SET \"test-key-$i\" \"value-$i\")",
- "Bash(# Create multiple connections to 6380 to spike connection count for i in {1..10}; do \\(valkey-cli -p 6380 DEBUG SLEEP 5 &\\) done echo \"\"Spiked connections on 6380. Wait 10 seconds for metrics to be collected...\"\" sleep 10 valkey-cli -p 6380 CLIENT LIST)",
- "Bash(# Use BLPOP to keep connections open on 6380 for i in {1..15}; do \\(valkey-cli -p 6380 BLPOP nonexistent-key-$i 30 &\\) done echo \"\"Created blocking connections on 6380\"\" sleep 2 valkey-cli -p 6380 CLIENT LIST)",
- "Bash(find:*)",
- "Bash(__NEW_LINE_f35492c487c1cd20__ echo \"\")",
- "Bash(__NEW_LINE_6c920833c1681c54__ echo \"\")",
- "Bash(__NEW_LINE_39bd60a6c51894a0__ echo \"\")",
- "Bash(__NEW_LINE_66edc211ee499a7e__ echo \"\")",
- "Bash(docker-compose ps:*)",
- "Bash(xargs:*)",
- "Bash(__NEW_LINE_42366b605d3a7b88__ echo \"\")",
- "Bash(__NEW_LINE_53e463c78b7eeb92__ echo \"\")",
- "Bash(# Generate some data first for i in {1..1000}; do docker exec betterdb-monitor-valkey valkey-cli -a devpassword SET \"\"testkey:$i\"\" \"\"value$i\"\" done echo \"\"Created 1000 keys\"\" # Run expensive KEYS command \\(will be slow and logged\\) docker exec betterdb-monitor-valkey valkey-cli -a devpassword KEYS \"\"*\"\")",
- "Bash(# Generate data on 6381 for i in {1..1000}; do docker exec valkey-6381 valkey-cli -a devpassword SET \"\"testkey:$i\"\" \"\"value$i\"\" done echo \"\"Created 1000 keys on 6381\"\" # Run expensive KEYS command docker exec valkey-6381 valkey-cli -a devpassword KEYS \"\"*\"\")",
- "Bash(WEBHOOK_ID=\"7e8fd7cc-931a-4485-916e-bb0c519271e9\":*)",
- "Bash(__NEW_LINE_d20a33920bc98c2c__ echo \"\")",
- "Bash(node /home/kristiyan/projects/valkey/monitor/packages/cli/bin/betterdb.js:*)",
- "Bash(npx esbuild:*)",
- "Bash(docker network inspect:*)",
- "Bash(npm view:*)",
- "Bash(node:*)",
- "Bash(npm pack:*)",
- "Bash(git -C /home/kristiyan/projects/valkey/monitor log --oneline -5)",
- "Bash(npm cache clean:*)",
- "Bash(npm uninstall:*)",
- "Bash(npm install:*)",
- "Bash(betterdb:*)",
- "Bash(gh run list:*)",
- "Bash(gh run view:*)",
- "Bash(lsof:*)",
- "Bash(npm ls:*)",
- "Bash(sudo npm uninstall:*)",
- "Bash(sudo npm install:*)",
- "Bash(npx prisma migrate dev:*)",
- "Bash(npx prisma generate:*)",
- "Bash(docker build:*)",
- "Bash(docker images:*)",
- "Bash(cd /home/kristiyan/projects/valkey/monitor/apps/frontend && npx vite build 2>&1)",
- "Bash(cd /home/kristiyan/projects/valkey/monitor/apps/web && npx vite build 2>&1)",
- "Bash(git:*)",
- "Bash(roborev show:*)",
- "Bash(roborev review:*)",
- "Bash(roborev list:*)",
- "Bash(roborev status:*)",
- "Bash(roborev:*)",
- "mcp__pointer__get-pointed-element",
- "Bash(npm list:*)",
- "Bash(valkey-benchmark:*)",
- "Bash(npx vitest:*)",
- "Bash(gh release:*)",
- "Bash(gh pr:*)",
- "Bash(do echo:*)",
- "Bash(psql:*)",
- "Read(//usr/bin/**)",
- "Read(//proc/567452/**)",
- "Bash(kill 599425:*)",
- "Bash(kill -9 815654 2>/dev/null; lsof -ti:3001 | xargs kill -9 2>/dev/null; echo \"killed\")",
- "Bash(cd /home/kristiyan/projects/valkey/monitor/apps/web && ./node_modules/.bin/tsc --noEmit 2>&1 | head -20)",
- "Bash(cd /home/kristiyan/projects/valkey/monitor && npx turbo run test 2>&1 | tail -30)",
- "Bash(head -5 apps/api/jest.config.* 2>/dev/null || head -5 apps/api/package.json 2>/dev/null; grep -m1 '\"test\"' apps/api/package.json 2>/dev/null || true)",
- "Bash(npm run:*)",
- "Bash(kill 1600607 2>/dev/null; kill 1596434 2>/dev/null; sleep 1 && fuser -k 3001/tcp 2>/dev/null; fuser -k 5173/tcp 2>/dev/null; sleep 1 && echo \"All cleared\")",
- "Bash(kill 1607901)",
- "Bash(find /home/kristiyan/projects/valkey/monitor/proprietary -type f \\\\\\(-name *.ts -o -name *.js -o -name *.md \\\\\\))",
- "Skill(update-config)",
- "Bash(npx:*)",
- "Bash(kill:*)",
- "Bash(1 echo curl -s -o /dev/null -w %{http_code} http://localhost:3390/api/health 2)",
- "Bash(1 echo ss -tlnp)",
- "Bash(BETTERDB_URL=http://localhost:3390 node:*)",
- "Bash(/tmp/mcp-stderr.log echo 'EXIT: $?' echo '=== STDOUT ===' cat /tmp/mcp-stdout.log echo '=== STDERR ===' cat /tmp/mcp-stderr.log)",
- "Bash(printf:*)",
- "Bash(/tmp/mcp-stderr2.log echo 'EXIT: $?' echo '=== STDOUT ===' cat /tmp/mcp-stdout2.log echo '=== STDERR ===' cat /tmp/mcp-stderr2.log)",
- "Bash(xxd:*)",
- "Bash(bash:*)",
- "Bash(PORT=3001 npx @betterdb/monitor:*)",
- "WebFetch(domain:static.modelcontextprotocol.io)",
- "Bash(VALKEY_URL=redis://localhost:6390 pnpm test 2>&1)",
- "Bash(VALKEY_URL=redis://localhost:6390 npx tsx index.ts --mock)",
- "Bash(ln -sf ../../../../ /home/kristiyan/projects/valkey/monitor/packages/semantic-cache/examples/basic/node_modules/@betterdb/semantic-cache)",
- "Bash(cd:*)",
- "Bash(VALKEY_URL=redis://localhost:6390 node -e \":*)",
- "WebFetch(domain:eclips4.github.io)",
- "Bash(node -e \"console.log\\(require\\(''iovalkey/package.json''\\).version\\)\")",
- "WebFetch(domain:github.com)",
- "Bash(gh api:*)",
- "WebFetch(domain:www.npmjs.com)",
- "Bash(claude mcp:*)",
- "Read(//tmp/**)",
- "Bash(tar xzf:*)",
- "Bash(npm --version)",
- "mcp__betterdb__list_instances",
- "mcp__betterdb__select_instance",
- "mcp__betterdb__get_health",
- "mcp__betterdb__get_slowlog",
- "mcp__betterdb__get_slowlog_patterns",
- "mcp__betterdb__get_memory",
- "mcp__betterdb__get_hot_keys",
- "mcp__betterdb__get_info",
- "mcp__betterdb__get_anomalies",
- "Bash(cat:*)",
- "mcp__betterdb-memory__search_context",
- "Bash(test:*)",
- "Bash(tar:*)",
- "Bash(chmod:*)",
- "Bash(~/.betterdb/bin/redis-shake:*)",
- "Bash(wget:*)",
- "Bash(wc:*)"
- ],
- "deny": [],
- "ask": []
- },
- "enableAllProjectMcpServers": true,
- "enabledMcpjsonServers": []
-}
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
--- a/.gitignore
+++ b/.gitignore
@@ -51,6 +51,7 @@
# Claude Code
.claude/settings.local.json
+.betterdb_context.md
# Turbo
.turbo
diff --git a/apps/api/src/migration/migration-execution.service.ts b/apps/api/src/migration/migration-execution.service.ts
--- a/apps/api/src/migration/migration-execution.service.ts
+++ b/apps/api/src/migration/migration-execution.service.ts
@@ -42,8 +42,9 @@
}
// 3. Detect if source is cluster
- const info = await sourceAdapter.getInfo(['cluster']);
- const clusterEnabled = String((info as Record<string, unknown>)['cluster_enabled'] ?? '0') === '1';
+ const info = await sourceAdapter.getInfo(['cluster']) as Record<string, Record<string, string>>;
+ const clusterSection = info.cluster ?? {};
+ const clusterEnabled = String(clusterSection['cluster_enabled'] ?? '0') === '1';
// 4. For redis_shake mode, locate the binary upfront
let binaryPath: string | undefined;
diff --git a/apps/api/src/migration/migration-validation.service.ts b/apps/api/src/migration/migration-validation.service.ts
--- a/apps/api/src/migration/migration-validation.service.ts
+++ b/apps/api/src/migration/migration-validation.service.ts
@@ -233,7 +233,7 @@
if (this.jobs.size < this.MAX_JOBS) return;
const terminal = Array.from(this.jobs.entries())
- .filter(([, j]) => j.status === 'completed' || j.status === 'failed')
+ .filter(([, j]) => j.status === 'completed' || j.status === 'failed' || (j.status as string) === 'cancelled')
.sort((a, b) => a[1].createdAt - b[1].createdAt);
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
|
|
||
| // 7. Compute baseline window | ||
| const oldestTimestamp = snapshots[snapshots.length - 1].timestamp; | ||
| const baselineWindowMs = migrationStartedAt - oldestTimestamp; |
There was a problem hiding this comment.
Baseline window uses wrong snapshot for oldest timestamp
Medium Severity
snapshots[snapshots.length - 1].timestamp is assumed to be the oldest snapshot, but getMemorySnapshots with a limit and no explicit ordering typically returns results in ascending timestamp order (oldest first for SQL ORDER BY timestamp patterns). If so, snapshots[snapshots.length - 1] is actually the newest, making baselineWindowMs near zero instead of spanning the full pre-migration window.
jamby77
left a comment
There was a problem hiding this comment.
Code Review
3 blocking issues, 5 warnings — details in inline comments.
Positive
- Clean NestJS module structure with proper separation (analysis/execution/validation)
- License gating correctly applied (analysis=community, execution/validation=Pro)
- Solid test coverage (13 spec files, ~2,240 lines)
- Correct TTL handling using
pttl/pexpire— no ms/s confusion - Cancellation checks at multiple points in SCAN loop
- TOML file cleanup in
finallyblocks, permissions0o600 - Connection cleanup via
Promise.allSettled - HFE detection gracefully handles Redis sources where
HEXPIRETIMEdoesn't exist
| const tgtPassword = target.password ?? ''; | ||
|
|
||
| let toml = `[scan_reader] | ||
| address = "${source.host}:${source.port}" |
There was a problem hiding this comment.
CRITICAL: TOML injection via host values
source.host is interpolated without escapeTomlString(), but username/password on lines 26-27 are escaped. A connection configured with a host like evil"\n[source]\naddress = "attacker:6379 could inject arbitrary TOML sections, redirecting the migration to an attacker-controlled instance.
escapeTomlString() is already defined in this file — just needs to be applied here and on line 37 (target.host):
| address = "${source.host}:${source.port}" | |
| address = "${escapeTomlString(source.host)}:${source.port}" |
|
|
||
| toml += ` | ||
| [redis_writer] | ||
| address = "${target.host}:${target.port}" |
There was a problem hiding this comment.
Same TOML injection issue as line 25 — target.host needs escapeTomlString():
| address = "${target.host}:${target.port}" | |
| address = "${escapeTomlString(target.host)}:${target.port}" |
.betterdb_context.md
Outdated
| @@ -0,0 +1,298 @@ | |||
| # BetterDB Session Context | |||
There was a problem hiding this comment.
CRITICAL: AI session context committed
This 298-line file contains internal session memory — PIDs, file paths, security discussion notes. Should not be in the repo.
Delete this file and add .betterdb_context.md to .gitignore.
.claude/settings.local.json
Outdated
| @@ -0,0 +1,187 @@ | |||
| { | |||
There was a problem hiding this comment.
CRITICAL: Local developer settings committed
The PR adds .claude/settings.local.json to .gitignore but the file is still tracked. Run:
git rm --cached .claude/settings.local.json
| continue; | ||
| } | ||
|
|
||
| const result = await migrateKey(sourceClient, targetClient, key, type); |
There was a problem hiding this comment.
WARNING: Sequential key migration — no batching
Each key is migrated one at a time with await migrateKey(...). While this avoids overwhelming the target, it's very slow for large datasets (1M+ keys could take hours).
Consider processing keys in parallel batches (e.g., Promise.all over chunks of 50-100 keys) to improve throughput while keeping backpressure bounded. The type handlers are already per-key and independent.
| for (const [field, val] of Object.entries(data)) { | ||
| args.push(field, val as Buffer); | ||
| } | ||
| await (target as any).hset(...args); |
There was a problem hiding this comment.
WARNING: as any casts on iovalkey client — 7 occurrences in this file (lines 80, 92, 110, 123, 130, 142, 179)
Project convention prohibits any. These casts exist because iovalkey's type defs don't support spread args for hset, rpush, sadd, xadd.
Workaround: use client.call('HSET', key, ...fields) which accepts string[] and avoids as any. Easier to audit for correctness too.
Dockerfile
Outdated
| ARG TARGETARCH | ||
| ARG REDISSHAKE_VERSION=4.6.0 | ||
| RUN apk add --no-cache wget && \ | ||
| wget -qO- "https://github.com/tair-opensource/RedisShake/releases/download/v${REDISSHAKE_VERSION}/redis-shake-v${REDISSHAKE_VERSION}-linux-${TARGETARCH}.tar.gz" \ |
There was a problem hiding this comment.
WARNING: RedisShake binary downloaded without checksum verification
The binary is fetched via HTTPS (good) but not verified against a SHA256 hash. A compromised CDN or MITM during build could inject a malicious binary that runs with full access to source/target databases.
Fix: pin a checksum and verify after download:
RUN wget -qO /tmp/redis-shake.tar.gz "https://..." && \
echo "<expected_sha256> /tmp/redis-shake.tar.gz" | sha256sum -c - && \
tar -xz ...Same applies to Dockerfile.prod.
# Conflicts: # pnpm-lock.yaml
…am DEL, cancelled status - Move evictOldJobs() before jobs.set() to prevent exceeding MAX_JOBS - Add sanitizeLogLine() to redact credentials from RedisShake logs - Add target.del(key) in migrateStream before XADD to prevent duplicates - Handle cancelled validation status in runValidation finally block Co-Authored-By: Claude <noreply@anthropic.com>
Add --allowedTools with the inline comment MCP tool, gh pr commands, and code exploration tools. Update prompt with repo/PR context and explicit instructions to use create_inline_comment for per-line feedback. Co-Authored-By: Claude <noreply@anthropic.com>
- Use ?? instead of || for totalKeys so 0 is not coerced to undefined - Rewrite sanitizeLogLine to avoid overlapping patterns and ensure full redaction of quoted/unquoted passwords and URL credentials Co-Authored-By: Claude <noreply@anthropic.com>
…r issue_comment events Co-Authored-By: Claude <noreply@anthropic.com>
| function sanitizeLogLine(line: string): string { | ||
| let sanitized = line; | ||
| // 1. Quoted passwords: password = "secret" or password:"secret" | ||
| sanitized = sanitized.replace(/password\s*[=:]\s*"[^"]*"/gi, (match) => { |
There was a problem hiding this comment.
High — Password redaction fails for TOML-escaped passwords containing "
The regex "[^"]*" stops at the first unescaped " it sees. But escapeTomlString writes passwords like foo"bar as "foo\"bar" in the TOML file. The pattern matches only "foo\" — stopping at the backslash-escaped inner quote — and the tail bar" leaks into the log output.
Example TOML log line:
password = "s3cr\"et"
After sanitization → password = "***"et" — the suffix et" is exposed.
Fix: use a regex that handles TOML escaped quotes (\"), or simply replace everything after the =/: up to the end of the token without trying to parse the TOML quoting:
| sanitized = sanitized.replace(/password\s*[=:]\s*"[^"]*"/gi, (match) => { | |
| sanitized = sanitized.replace(/password\s*[=:]\s*"(?:[^"\\]|\\.)*"/gi, (match) => { | |
| const eqIdx = match.search(/[=:]/); | |
| return match.slice(0, eqIdx + 1) + ' "***"'; | |
| }); |
The same applies to pattern 2 (\S+) — it will mis-fire on the leftover et" after the partial match, but the root cause is pattern 1 not consuming the full quoted value.
Review SummaryOverall the implementation is well-structured: TOML credential escaping is thorough, the temp-key/RENAME pattern for atomic writes is correct for the common case, binary-safe Buffer reads are used throughout, and the sanitized log pipeline is a thoughtful addition. Six issues found — two require fixes before merge. High — Fix before merge1. Validation job limit not enforced ( 2. Cross-user 3. Empty hash-tag keys → CROSSSLOT in cluster migration ( Medium4. 5. Low6. Issues #1 and #3 are the highest-priority; both have silent failure modes (unbounded memory growth and skipped keys with no user-visible error, respectively). Issue #2 is the most security-sensitive. |
…dling, atomic TTL, and list drift check
- tempKey() returns null for keys with empty/invalid hash tags (e.g.
`user:{}:1`) to avoid CROSSSLOT errors in cluster mode; all type
handlers fall back to direct writes with applyTtl when no safe temp
key is possible
- Validation evictOldJobs() now throws ServiceUnavailableException
instead of silently logging when all job slots are occupied
- Cross-user analysisId access blocked by verifying source/target
connection IDs match the analysis job
Co-Authored-By: Claude <noreply@anthropic.com>
| const discrepancy = targetKeys - sourceKeys; | ||
| const discrepancyPercent = sourceKeys === 0 | ||
| ? (targetKeys > 0 ? 100 : 0) | ||
| : Math.abs(discrepancy / sourceKeys) * 100; |
There was a problem hiding this comment.
Discrepancy percent uses absolute value inconsistently
Medium Severity
discrepancyPercent is computed as Math.abs(discrepancy / sourceKeys) * 100 (always positive), but the validation pass/fail check in migration-validation.service.ts tests Math.abs(keyCount.discrepancyPercent) < 1, which is redundant since it's already absolute. More critically, the discrepancy field is signed (target - source), but the percent is always positive, so consumers can't tell if keys were lost (negative discrepancy) vs. gained — the sign information is discarded.
Additional Locations (1)
| throw new BadRequestException('Analysis does not match the provided source/target connections'); | ||
| } | ||
| analysisResult = job; | ||
| } |
There was a problem hiding this comment.
Migration service getJob missing sourceConnectionId and targetConnectionId
Medium Severity
The validation service checks job.sourceConnectionId and job.targetConnectionId to verify the analysis matches the provided connections, but MigrationService.getJob() returns a MigrationAnalysisResult type, and these fields are only set on job.result (not on the returned object's top level until completion). If the analysis is still completed but the fields are nested in result, the cross-reference check may pass or fail incorrectly depending on timing.
Additional Locations (1)
| if (tmp) { try { await target.del(tmp); } catch { /* best-effort cleanup */ } } | ||
| throw err; | ||
| } | ||
| return true; |
There was a problem hiding this comment.
High — keysTransferred overcounted for expired compound keys
When pttl === -2 or pttl === 0 at line 151, the Lua script atomically renames the temp key then immediately DELs it (lines 350–351 of the Lua), discarding the data. Yet migrateHash still returns true here, causing job.keysTransferred++ in the caller — counting a key that no longer exists on the target.
The same issue exists in migrateList, migrateSet, migrateZset, and migrateStream.
Fix: check pttl before calling atomicRenameWithTtl and return false early for expired keys, so the write is also avoided:
| return true; | |
| const pttl = await source.pttl(key); | |
| if (pttl === -2 || pttl === 0) { | |
| if (tmp) { try { await target.del(tmp); } catch { /* best-effort cleanup */ } } | |
| return false; | |
| } | |
| if (tmp) { | |
| await atomicRenameWithTtl(target, tmp, key, pttl); | |
| } else { | |
| await applyTtl(target, key, pttl); | |
| } | |
| } catch (err) { | |
| if (tmp) { try { await target.del(tmp); } catch { /* best-effort cleanup */ } } | |
| throw err; | |
| } | |
| return true; |
Apply the same pattern to migrateList (line 182), migrateSet (line 223), migrateZset (line 278), and migrateStream (line 327).
| if (!/NOSCRIPT|unknown command|DENIED|NOPERM/i.test(msg)) { | ||
| throw err; | ||
| } | ||
| await target.rename(tmp, key); |
There was a problem hiding this comment.
High — iovalkey Cluster client rejects multi-key RENAME
The fallback at this line calls target.rename(tmp, key). iovalkey's Cluster client always throws a CROSSSLOT error for multi-key commands — even when both keys hash to the same slot — because it validates cross-slot usage before routing.
tempKey() guarantees same-slot keys via hash tags, but the iovalkey Cluster client still rejects the call at the client layer.
The comment says this path covers NOSCRIPT | unknown command | DENIED | NOPERM. On a cluster target, EVAL could also fail with CROSSSLOT (if KEYS declaration is wrong) — and the fallback would then also fail with CROSSSLOT, producing a confusing double-failure.
Fix: on cluster targets, don't fall back — re-throw all errors:
| await target.rename(tmp, key); | |
| if (!/NOSCRIPT|unknown command|DENIED|NOPERM/i.test(msg)) { | |
| throw err; | |
| } | |
| // iovalkey Cluster rejects multi-key RENAME even for same-slot keys at the | |
| // client layer. Raise here rather than producing a confusing secondary error. | |
| if ((target as unknown as { isCluster?: boolean }).isCluster) { | |
| throw err; | |
| } | |
| await target.rename(tmp, key); |
Alternatively, pass an isCluster flag into atomicRenameWithTtl and skip the fallback when true.
|
|
||
| try { | ||
| // DEL the target when writing directly (no temp key) | ||
| if (!tmp) await target.del(key); |
There was a problem hiding this comment.
Medium — unrecoverable data loss when tmp === null and write fails mid-stream
When tempKey() returns null (keys with empty/invalid hash tags like user:{}:1), the code deletes the existing target key here and then writes directly to key. If the HSCAN loop throws on a subsequent iteration — OOM, timeout, disconnection — the catch block at line 157 only tries to clean up tmp (which is null). The target key was already deleted and partially written, with no rollback path. The key is permanently lost on the target.
The same risk exists in migrateList (line 174), migrateSet (line 205), migrateZset (line 246), and migrateStream (line 299).
Consider either:
- Propagating this as a skip rather than a throw so the key is retried by the caller, or
- Documenting explicitly in
MigratedKeythattmp === nullkeys have no atomicity guarantee, and surfacing awarningwhen this path is taken so the operator knows which keys may be partially written.
| batchPromises.push( | ||
| migrateKey(sourceClient, targetClient, key, type).then(result => { | ||
| if (result.ok) { | ||
| job.keysTransferred++; |
There was a problem hiding this comment.
High — keysTransferred incremented for keys that were deleted by expired-TTL logic
migrateKey returns ok: true for compound types even when pttl === 0 or pttl === -2 caused the Lua script to delete the just-written key (see type-handlers.ts RENAME_WITH_TTL_LUA). This causes the counter to over-report transfers. The validation phase will then find those keys missing on the target and flag them as discrepancies.
This is fixed by addressing the return value in type-handlers.ts (the linked comment there), but worth calling out here too — the counter increment relies on result.ok faithfully reflecting whether a key actually persists on the target after the operation.
| job.logs.shift(); | ||
| } | ||
| const parsed = parseLogLine(line); | ||
| if (parsed.keysTransferred !== null) job.keysTransferred = parsed.keysTransferred; |
There was a problem hiding this comment.
High — counter can reset to zero from a non-progress log line
job.keysTransferred is assigned (not incremented) from parsed log lines. The regex at log-parser.ts:48 — /scanned[=: ]+(\d+)/i — is broad and will match log lines that aren't progress reports, e.g. an error message like:
[ERROR] 0 keys scanned: connection refused
This would set job.keysTransferred = 0, silently zeroing a counter that may have been in the thousands.
Fix: only update when the new value is at least as large as the current one:
| if (parsed.keysTransferred !== null) job.keysTransferred = parsed.keysTransferred; | |
| if (parsed.keysTransferred !== null && parsed.keysTransferred >= job.keysTransferred) job.keysTransferred = parsed.keysTransferred; |
| count: data.count, | ||
| sampledMemoryBytes: data.bytes, | ||
| estimatedTotalMemoryBytes: sampledKeys.length > 0 | ||
| ? Math.round((data.bytes / sampledKeys.length) * totalKeys) |
There was a problem hiding this comment.
Medium — per-type memory estimate formula is wrong
Math.round((data.bytes / sampledKeys.length) * totalKeys)data.bytes is the sampled memory for one specific type, but sampledKeys.length is the total number of sampled keys across all types. This formula assumes every key in the database is of typeName, inflating per-type estimates by 1 / typeShare.
For example, if 10% of keys are hashes, the hash memory estimate will be ~10× too large (the formula multiplies hash bytes × all-key count instead of hash bytes × estimated hash count).
Correct formula:
| ? Math.round((data.bytes / sampledKeys.length) * totalKeys) | |
| estimatedTotalMemoryBytes: data.count > 0 && sampledKeys.length > 0 | |
| ? Math.round((data.bytes / data.count) * Math.round((data.count / sampledKeys.length) * totalKeys)) | |
| : 0, |
The same bug appears for the other bucket at lines 290–292. Note that the total estimate at lines 300–302 is correct since it sums all bytes and divides by all sampled keys.
| const targetIsCluster = String(targetClusterSection['cluster_enabled'] ?? '0') === '1'; | ||
|
|
||
| // Create temporary iovalkey clients — same pattern as command-migration-worker.ts | ||
| sourceClient = createClient(sourceConfig, 'BetterDB-Validation-Source'); |
There was a problem hiding this comment.
Medium — cluster source gets a biased sample in validation
createClient creates a single standalone connection to sourceConfig.host:port. When the source is a cluster, collectSampleKeys (via SCAN) only iterates the keyspace on that one node — typically 1/N of all keys, where N is the number of master nodes. The sample will miss keys homed on other shards, making validation statistics significantly unreliable for cluster sources.
The analysis phase handles this correctly by enumerating all master nodes (see command-migration-worker.ts:34–53). The validation phase should do the same — detect cluster topology, create per-node clients, and merge the samples.
At minimum, detect the cluster mode here (the same way targetIsCluster is detected above) and log a warning if sourceIsCluster === true, so the operator knows the sample is incomplete.
Code Review SummaryThe architecture is solid overall — binary-safe Buffer reads throughout, hash-tag-based same-slot temp keys, TOML injection protection, credential redaction in logs, and atomic temp-key-then-rename for compound types are all well-implemented. A few correctness and data-integrity issues need attention before merge. Must fix
Should fix
|
|
Thanks for the thorough review. Investigated each finding - here's why we're comfortable deferring these: pttl race deleting the renamed key (type-handlers.ts:151) - The source key must expire in the sub-millisecond window between the data read and the PTTL call on the same connection. When it does happen, the Lua script DELs the target, which is actually correct behavior (the key expired, so it shouldn't exist on the target). The only effect is keysTransferred overcounts by 1 for a key that was already dying. No data integrity risk. CROSSSLOT in atomicRenameWithTtl fallback (type-handlers.ts:385) - tempKey() specifically constructs same-slot keys using hash tags, so iovalkey routes both keys to the same node. The claim that iovalkey rejects all multi-key commands regardless of slot isn't accurate for properly hash-tagged keys. This fallback also only triggers when EVAL is ACL-blocked on the target, which is uncommon. We've run this against cluster targets without hitting CROSSSLOT. Regex keysTransferred overcount (migration-execution.service.ts:131) - This log parser is only used for the redis-shake subprocess path. Command-mode migration (the primary path) increments job.keysTransferred directly and never touches the parser. For redis-shake, the parser tries JSON first and only falls back to regex for non-JSON output. Even if the regex matches a stray line, it would briefly set the counter to a wrong value that gets corrected on the next real progress line. Counter display glitch at worst. Per-type memory estimate (migration.service.ts:275) - The per-type estimatedTotalMemoryBytes values are indeed skewed (divided by total sample count rather than per-type count). But the aggregate estimatedTotalMemoryBytes at line 300 uses the correct formula. The per-type numbers appear in the analysis breakdown UI only - they don't feed into any migration logic or decisions. Worth fixing eventually for accuracy, but not a correctness issue. Validation SCAN on cluster (migration-validation.service.ts:111) - The validation samples 500 keys from one node, so on an N-master cluster it covers ~1/N of the keyspace. This is a spot-check, not an exhaustive audit - 500 keys from any single node is still meaningful signal since the migration logic is identical across all nodes. A systematic bug would show up regardless of which node is sampled. The analysis phase already does proper per-node sampling. No cleanup on null tempKey (type-handlers.ts:137) - tempKey() only returns null for keys matching the user:{}:1 pattern (empty braces with no valid hash tag). This pattern is extremely rare in real workloads. To actually lose data you'd also need a network failure mid-write on one of those keys. The post-migration validation step would catch any partial writes. |
Co-Authored-By: Claude <noreply@anthropic.com>
| progress: job.progress, | ||
| createdAt: job.createdAt, | ||
| completedAt: job.completedAt, | ||
| error: job.error ?? (job.status === 'cancelled' ? 'Analysis timed out' : undefined), |
There was a problem hiding this comment.
User-cancelled analysis jobs show "timed out" error
Low Severity
In getJob, when job.error is undefined and job.status === 'cancelled', the returned error defaults to 'Analysis timed out'. But cancelJob (user-initiated cancellation) doesn't set job.error, so user-cancelled jobs are incorrectly reported as timed out. The code doesn't distinguish between user cancellation and stuck-job timeout cancellation.
The test docker-compose.test.yml maps Valkey to port 6390, but the migration e2e tests had port 6380 hardcoded, causing "Connection is closed" failures in CI. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 18 total unresolved issues (including 17 from previous reviews).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| `; | ||
|
|
||
| return toml; | ||
| } |
There was a problem hiding this comment.
RedisShake TOML omits target cluster mode flag
Medium Severity
buildScanReaderToml accepts sourceIsCluster and conditionally adds cluster = true to the [scan_reader] section, but has no parameter for the target's cluster topology. In migration-execution.service.ts, targetIsCluster is correctly detected but never passed to the TOML builder. The generated [redis_writer] section will lack cluster = true when the target is a cluster, which would cause RedisShake to fail or misbehave when writing to a cluster target.
Additional Locations (1)
| if (!/NOSCRIPT|unknown command|DENIED|NOPERM/i.test(msg)) { | ||
| throw err; | ||
| } | ||
| await target.rename(tmp, key); |
There was a problem hiding this comment.
High — Non-atomic rename+expire creates a TTL-loss window
When EVAL is rejected by ACL, the fallback does two separate commands:
RENAME tmp → key (key now exists on target with NO expiry)
PEXPIRE key pttl (TTL applied here, ≥1 network round-trip later)
Any reader between those two commands will see the key without its original TTL. Worse, if the server crashes or the process is killed between the two commands, the key lives on the target indefinitely — even if the source had a 5-minute TTL.
Suggestions:
- Emit a warning log when the fallback fires so operators know atomicity was lost.
- Consider failing the key (returning
ok: false) rather than silently accepting data-integrity loss, especially for TTL-bearing keys.
| await target.rename(tmp, key); | |
| // Only fall back for NOSCRIPT / unknown-command / ACL-denied errors. | |
| // Transient errors (OOM, timeouts) should propagate. | |
| if (!/NOSCRIPT|unknown command|DENIED|NOPERM/i.test(msg)) { | |
| throw err; | |
| } | |
| // Non-atomic fallback: TTL may be lost if the process is killed between | |
| // RENAME and PEXPIRE. Callers should be aware this path sacrifices atomicity. | |
| // eslint-disable-next-line no-console | |
| console.warn(`[migration] EVAL blocked (${msg}); falling back to non-atomic RENAME+PEXPIRE for key — TTL may be lost on crash`); | |
| await target.rename(tmp, key); | |
| if (pttl > 0) { | |
| await target.pexpire(key, pttl); | |
| } else if (pttl === -2 || pttl === 0) { | |
| await target.del(key); | |
| } |
| function sanitizeLogLine(line: string): string { | ||
| let sanitized = line; | ||
| // 1. Quoted sensitive fields: password = "secret" or username:"admin" | ||
| sanitized = sanitized.replace( |
There was a problem hiding this comment.
High — ReDoS in log sanitisation regex
The alternation (?:[^"\\\\]|\\\\.)* is a classic catastrophic backtracking pattern. For input like password = "aaaaaa... (with no closing quote), the regex engine will explore exponentially many paths. Because sanitizeLogLine runs on every line of RedisShake stdout/stderr output, a single malformed log line (e.g. triggered by a specially-crafted key/password) could pin the event loop.
Simplest safe fix — replace the quoted-value group with a lazy match that can never backtrack catastrophically:
| sanitized = sanitized.replace( | |
| sanitized = sanitized.replace( | |
| new RegExp(`(${SENSITIVE_KEYS.source})\\s*[=:]\\s*"[^"\\n]*"`, 'gi'), | |
| (match) => { | |
| const eqIdx = match.search(/[=:]/); | |
| return match.slice(0, eqIdx + 1) + ' "***"'; | |
| }, | |
| ); |
[^"\\n]* is a possessive-style class (no backtracking between alternatives) and also stops at newlines, so a runaway input can only scan to end-of-line at worst.
| const key = keys[i]; | ||
| const type = types[i]; | ||
|
|
||
| if (type === 'none') { |
There was a problem hiding this comment.
Medium — Expired keys silently inflate keysProcessed, skewing progress %
When a key expires between SCAN and TYPE, it's counted toward keysProcessed (and thus the job.progress denominator), but it's neither transferred nor recorded as skipped. This means:
- Progress % is calculated correctly (the slot is "consumed")
- But
keysTransferred + keysSkipped < totalKeysat the end, which will confuse operators reading the summary
The finish log already says ${job.keysTransferred} transferred, ${keysSkipped} skipped out of ${totalKeys} total, so there'll be unexplained missing keys. Small fix:
| if (type === 'none') { | |
| if (type === 'none') { | |
| // Key expired between SCAN and TYPE — count as skipped so the summary adds up | |
| keysSkipped++; | |
| job.keysSkipped = keysSkipped; | |
| keysProcessed++; | |
| continue; | |
| } |
| return n; | ||
| } | ||
|
|
||
| function validateHost(host: string): string { |
There was a problem hiding this comment.
Medium — IPv6 scoped-link addresses bypass validation and break RedisShake
validateHost rejects whitespace/quotes/backslashes, but the character set [\s"\\] passes through % — which appears in IPv6 scoped-link addresses like fe80::1%eth0. formatAddress then wraps this as [fe80::1%eth0]:6379, which is syntactically invalid for Go's net.Dial and will cause RedisShake to fail to connect (the %eth0 scope must be URL-percent-encoded as fe80::1%25eth0 inside brackets).
| function validateHost(host: string): string { | |
| function validateHost(host: string): string { | |
| if (!host || host.length > 253) { | |
| throw new Error('Invalid host: empty or too long'); | |
| } | |
| if (/[\s"\\]/.test(host)) { | |
| throw new Error('Invalid host: contains whitespace, quotes, or backslashes'); | |
| } | |
| // Percent-encode IPv6 zone IDs so Go's net.Dial accepts the address | |
| // e.g. "fe80::1%eth0" → "fe80::1%25eth0" | |
| return host.replace(/%(?!25)/g, '%25'); | |
| } |
| // Write PID file for orphan detection on server restart | ||
| const pidPath = join(os.tmpdir(), `${job.id}.pid`); | ||
| try { | ||
| writeFileSync(pidPath, String(proc.pid), { encoding: 'utf-8', mode: 0o600 }); |
There was a problem hiding this comment.
Medium — Orphaned redis-shake processes survive server restarts
The PID file is written here for "orphan detection," but there is no corresponding startup code that reads existing PID files and kills stale processes. If the BetterDB API restarts while a redis_shake job is running:
- The
ChildProcessreference is lost (in-memory only) redis-shakecontinues running indefinitely, consuming source/target connections and CPU- The PID file in
/tmpis never cleaned up - The job entry is gone from the new process's
jobsMap, so it can't be cancelled through the API
Either implement onModuleInit() orphan cleanup, or accept that this approach offers no real protection and remove the PID file write to avoid misleading future maintainers:
// In MigrationExecutionService, add:
async onModuleInit() {
const tmpDir = os.tmpdir();
const pidFiles = readdirSync(tmpDir).filter(f => f.endsWith('.pid'));
for (const f of pidFiles) {
const pid = parseInt(readFileSync(join(tmpDir, f), 'utf-8'), 10);
if (!isNaN(pid)) {
try { process.kill(pid, 'SIGTERM'); } catch { /* already gone */ }
}
try { unlinkSync(join(tmpDir, f)); } catch { /* ignore */ }
}
}| } | ||
|
|
||
| private evictOldJobs(): void { | ||
| if (this.jobs.size < this.MAX_JOBS) return; |
There was a problem hiding this comment.
Medium — Job limit permanently deadlocks if running jobs never clean up
evictOldJobs only removes jobs in terminal states (completed, failed, cancelled). If the server was previously restarted mid-migration (and those in-flight jobs were lost from the Map), the new process starts fresh — so in practice this specific scenario won't deadlock. But there is a legitimate deadlock path:
- 10 simultaneous long-running
commandmigrations are started - None completes before a new request arrives
evictOldJobsfinds 0 terminal jobs and throwsServiceUnavailableException- The API is now unable to accept any new migration until one of the 10 finishes
Since runCommandMigration is not cancellation-safe between arbitrary await points (only between batches), a stuck worker can block a slot indefinitely.
Consider also evicting jobs that have been running for longer than a configurable timeout:
private evictOldJobs(): void {
if (this.jobs.size < this.MAX_JOBS) return;
const RUNNING_TIMEOUT_MS = 4 * 60 * 60 * 1000; // 4 h
const now = Date.now();
const evictable = Array.from(this.jobs.entries())
.filter(([, j]) =>
j.status === 'completed' || j.status === 'failed' || j.status === 'cancelled' ||
(j.status === 'running' && now - j.startedAt > RUNNING_TIMEOUT_MS),
)
.sort((a, b) => a[1].startedAt - b[1].startedAt);
// ... rest unchanged|
|
||
| // ── Analysis endpoints (community-tier) ── | ||
|
|
||
| @Post('analysis') |
There was a problem hiding this comment.
Medium — Analysis endpoints have no authentication guard
The three analysis endpoints (POST /migration/analysis, GET /migration/analysis/:id, DELETE /migration/analysis/:id) carry no @UseGuards(...) decorator. If the app doesn't enforce a global auth guard (or if the middleware is opt-in), any unauthenticated caller can:
- Trigger unlimited analysis jobs against arbitrary connection IDs (connecting to registered databases)
- Enumerate job IDs by brute-forcing UUIDs
- Cancel other users' in-flight analysis jobs
The comment says analysis is "intentionally community-tier (no license guard)" — that refers to the license check, not authentication. Auth and licensing are orthogonal. Even community features should require a logged-in user.
If a global JWT/session guard is already applied at the app level, please add a comment noting that here so reviewers don't have to trace the middleware chain to confirm it.
|
|
||
| // Lua script: atomically RENAME tmp→key and PEXPIRE in one round-trip. | ||
| // KEYS[1] = tmp, KEYS[2] = final key, ARGV[1] = pttl (or "-1" for no expiry, "-2" for expired) | ||
| const RENAME_WITH_TTL_LUA = ` |
There was a problem hiding this comment.
Low — Lua script deletes a just-renamed key when pttl == 0
elseif pttl == -2 or pttl == 0 then
redis.call('DEL', KEYS[2])PTTL returns 0 only in the narrow window where the key's expiry is imminent (< 1 ms remaining). Treating that the same as -2 (key doesn't exist) causes the migrated key to be immediately deleted on the target. The key effectively won't survive the migration even though it existed on the source when we started reading it.
For consistency with the non-Lua path (applyTtl also has this same pttl === 0 branch), this should be reviewed: if the intent is "delete anything that's effectively expired", 0 makes sense; if the intent is "preserve keys that still exist", then pttl == 0 should either be treated as no-TTL (-1) or given a small grace period (e.g. 100 ms). Same change needed in applyTtl on line ~358.
Review summarySolid implementation overall — the phase structure is clear, the compatibility checks are thorough, and the test coverage is meaningfully broad for a feature this size. Eight issues flagged below (inline comments have the details and suggested fixes). Issues by severity
Things that look good
|



Summary
[WIP] Migration: Pre-migration analysis, execution, and post-migration validation
Background
RedisShake and similar CLI tools can move data between instances but provide no compatibility analysis before migration, no integration with monitoring to verify health afterwards, and no UI. Cloud-provider native migration only works within a single provider's ecosystem. Self-hosted users and cross-provider migrations have no comparable tooling.
BetterDB's value is at steps 1 and 3. Step 2 is a commodity.
What's in this PR
Phase 1 - Pre-migration Analysis (Community)
Connects to two instances the user already has registered in BetterDB - source and target - and produces a migration readiness report without moving a single byte.
Source analysis (via SCAN sampling):
HRANDFIELD+HEXPIRETIMEpipeline with anHLEN > 10kguard for oversized hashes. Gracefully skips and markshfeSupported: falseon Redis sources where the command doesn't existCompatibility check (live, not static): Rather than a hardcoded Redis-vs-Valkey command list, the analysis reads
INFOfrom both instances and derives concrete incompatibilities by comparing them directly. Ten checks are implemented:Not yet tested:
linux/amd64,linux/arm64)Known limitations (documented, not bugs)
Checklist
roborev review --branchor/roborev-review-branchin Claude Code (internal)Note
High Risk
Introduces new data-migration execution paths (including spawning external binaries and writing/renaming keys) plus new authenticated API endpoints, so bugs could cause data loss, incomplete migrations, or credential leakage if sanitization/validation is incorrect.
Overview
Adds a new
MigrationModulewith/migrationendpoints for pre-migration analysis (community tier) and execution/validation (license-gated), including job lifecycle management (start/poll/cancel), progress tracking, and eviction of old jobs.Implements analysis routines for key/type sampling, TTL distribution, command usage (COMMANDLOG/SLOWLOG), Hash Field Expiry detection, and source/target compatibility checks, and adds execution via either spawning RedisShake (with TOML generation + log parsing/sanitization) or an in-process command-based migrator with type-specific handlers and TTL preservation.
Updates container builds to bundle a checksum-verified RedisShake binary, adds an API test script and extensive unit/e2e coverage for the migration components, and expands
.gitignorefor build/tooling artifacts.Written by Cursor Bugbot for commit f2bcab9. This will update automatically on new commits. Configure here.